Rapid algorithm for identifying backbones in the two-dimensional percolation model 
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We present a rapid algorithm for identifying the current-carrying backbone in the percolation model. 
It applies to general two-dimensional graphs with open boundary conditions. Complemented by 
the modified Hoshen-Kopelman cluster labeling algorithm, our algorithm identifies dangling parts 
using their local properties. For planar graphs, it finds the backbone almost four times as fast as 
Tarjan's depth-first-search algorithm, and uses the memory of the same size as the modified Hoshen- 
Kopelman algorithm. Comparison with other algorithms for backbone identification is addressed. 



PACS numbers: 64.60.Ak, 02.70.-c, 05.10.-a 

The percolation model describes a system consisting 
of randomly distributed "conducting" cells and "isolat- 
ing" cells Q. When the density of the conducting cells 
exceeds a threshold value p c , a cluster of connected con- 
ducting cells spans the system. This model has become 
one of the most extensively studied statistical models 
because of its simplicity and its often surprising appli- 
cability. Its relevance to physics includes, for example, 
the metal-insulator transition observed in disordered sys- 
tems Numerical simulation is an important means 
in the solution of the model. For a complex system, 
which can be originated from the complexity of its ge- 
ometry or the mechanism underlying the random dis- 
tribution function, numerical simulation seems to be a 
unique method available. To get rid off severe statisti- 
cal fluctuations brought by disorder in the percolation 
model, numerical calculations, in general, need to be 
done over a large number of large scale ensembles. Find- 
ing rapid algorithms is thus an interesting topic in this 
field ||i0||@0@||@Jl|@. 

The critical behavior of percolation is well understood. 
Because of its relationship to the one-state Potts model 
jlj , all critical exponents that have thermal analogue are 
exactly known for the dimension d = 2. Those having 
no thermal analogue are usually estimated by numerical 
work. The most important of these are the backbone 
fractal dimension and the conductivity exponent. The 
backbone is defined as the subset of the spanning cluster 
that carries current when a potential difference is ap- 
plied between two sides of the system. Far away from 
Pc the conductivity of the system can be accurately cal- 
culated within the effective-medium theory PJl7[|; oth- 
erwise, numerical studies are needed. The calculation 
of the conductivity on the current-carrying backbone is 
considerably faster than on the spanning cluster [ ^2|Jl4| 
because the density of the backbone in the spanning clus- 
ter substantially decreases as the system approaches to 
p c (T§| l. Backbone identification is thus a necessary step 
before estimating the conductivity using the Lobb-Frank 



algorithm |Tl| or the multigrid algorithm fl^ , [l5| . 

The purpose of this paper is to present a fast algo- 
rithm for d = 2 backbone identification. The backbone 
is called the set of biconnected nodes in computer sci- 
ence. For general graphs, heretofore the fastest algorithm 
to identify it is Tarjan's depth-first-search algorithm j7j 
which runs in time O(N) for a graph of N nodes, pro- 
vided the number of edges meeting at each node is fi- 
nite. For d — 2 planar graphs, our algorithm is almost 
four times as fast as Tarjan's algorithm. We emphasize 
here that our algorithm applies to general d — 2 graphs 
with open boundary conditions. There are other algo- 
rithms for backbone identification 10,1^]. We will 
first describe our algorithm and then compare it with 
these known algorithms. 

Below we give the definitions of all terminologies in- 
volved and illustrate them in a square lattice having L 
sites horizontally and L + 1 sites vertically (see Figure [j]) . 
Between neighboring sites we insert a bond that is taken 
to be open with an independent probability p, or closed 
with probability 1 — p. Obviously, here is defined the 
percolation model for bonds jl9| . Open lateral boundary 
conditions are assumed, whereas all bonds locating at the 
top and the bottom are set open. A potential difference is 
applied between the top and the bottom. Two neighbor- 
ing sites are connected to each other if the bond between 
them is open. The spanning cluster is the set of sites if 
there is a path of open bonds from each of them to the 
bottom and to the top. The spanning cluster consists of 
the current-carrying backbone and non-current-carrying 
dangling parts. There are three kinds of dangling parts of 
the spanning cluster: dangling ends, dangling arcs, and 
dangling loops. A dangling end is connected to the span- 
ning cluster by only one open bond (e.g., b in Figure [l]); 
A dangling arc disconnects with the top (bottom) but is 
connected to the bottom (top) by at least two open bonds 
(e.g., c's in Figure [l]); A dangling loop is connected by at 
least two open bonds to only one site of the spanning 
cluster (e.g., a in Figure 0). Here we name this site a 
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dangling-loop-connecting site, which is called an articula- 
tion node in computer science. After all dangling parts 
are removed, the remainder of the spanning cluster is the 
backbone. 

Our algorithm is based on Jordan's curve theorem: 

Suppose there is a loop in a two-dimensional graph, then 
the area inside the loop disconnects with the area outside, 
and vise versa. 

To make use of the theorem, we introduce the concept 
of plaguette that is an as small as possible area enclosed 
by a loop of bonds which can be either open or closed. 
For example, any one of the smallest square areas in Fig- 
ure |l|. We define that two nearest neighboring plaquettes 
are connected if the bond between them is closed. Note 
that the definition of connectedness for plaquettes is op- 
posite to that for sites. Jordan's curve theorem tells us 
that any dangling part must be enclosed by a loop of 
connected plaquettes if it is split from the spanning clus- 
ter. Hence, by use of the connectedness of plaquettes, 
we are able to identify dangling parts according to the 
following three corollaries of the theorem, which describe 
the local properties of the three kinds of dangling parts, 
respectively. We will directly use them to design our al- 
gorithm. 

Corollary 1. An open bond connects a dangling end 
to the spanning cluster if and only if two plaquettes be- 
side the bond are connected. For example, b in Figure [l]. 

Corollary 2. If two non-neighboring plaquettes near- 
est neighboring the top (bottom) are connected, all sites 
that are connected to the top (bottom) by the bonds be- 
tween the two plaquettes belong to dangling arcs. For 
example, c in Figure |j} 

Corollary 3. Let all dangling ends and arcs have been 
removed. Site a is a dangling-loop-connecting site if and 
only if (i) a connects at least four open bonds, referred 
to as bonds 1,2,3,4, and (ii) among its neighboring pla- 
quettes, some between two of the four bonds (e.g., bonds 
1 and 2) is connected by a path of plaquettes to some 
others between the other two bonds (i.e., bonds 3 and 
4). For example, a in Figure [|. a's northern, eastern, 
southern, and western open bonds correspond to bonds 
1, 2, 3, 4, respectively, a's northeast plaquette which is 
between bonds 1 and 2 is connected by a path of plaque- 
ttes to a's southwest plaquette which is between bonds 
3 and 4. Note that here bonds 1 and 4 belong to the 
backbone, bonds 2 and 3 belong to a dangling loop. 

Now let us describe the procedure of our algorithm. In 
accordance with open lateral boundary conditions, the 
left (right) plaquettes of the leftmost (rightmost) column 
of vertical bonds are set to be the same, respectively, as 
shown in Figure [l]. The algorithm is carried out as follows 
(see Figure ||). 

Step 1. Compute the connectedness of plaquettes. 
If the leftmost plaquette is connected to the rightmost 
plaquette by a path of plaquettes, the spanning cluster 
does not exist 20 1. 

Step 2 for identifying dangling ends. Sweep the bond 
graph to close any bond whose two-side plaquettes have 



the same label. This is the application of Corollary 1. 

Step 3 for identifying dangling arcs. If two plaquettes 
neighboring the top (or the bottom) have the same label, 
then close all bonds between the two plaquettes. This is 
the application of Corollary 2. 

Step 4 for identifying dangling loops. We look 
for dangling-loop-connecting sites (a) using Corollary 3. 
That is, a connects four open bonds, referred to as bonds 
1,2,3,4, and a plaquette between bond 1 and 2 has the 
same label as a plaquette between bond 3 and 4. Note 
that a's themselves belong to the backbone. We use the 
following trick to split dangling loops: We create a new 
site a' and let bonds 2 and 3 be connected to a' instead 
of a. As a result, the subgraph connected to bonds 2 and 
3 is disconnected to that connected to bonds 1 and 4. 

Step 5. Compute the connectedness of sites including 
those newly created sites (a'). The resultant spanning 
cluster is exactly the backbone. 

In this paper, the connectedness of plaquettes or sites 
is calculated by using the modified Hoshen-Kopelman 
cluster-labeling algorithm flH. The data structure of 
this algorithm includes a bond status array that records 
the status of any bond [ pl| , a site label array that records 
the label of any site (connected sites will have the same 
label), and a bond-site array that records two end sites 
of any bond. Our algorithm needs two extra arrays to 
record the information of plaquettes. One is a plaquette 
label array that records the label of any plaquette (con- 
nected plaquettes will have the same label), the other is 
a bond-plaquette array that records two side plaquettes 
of any bond. Given this data structure, we can compute 
both the connectedness of sites and the connectedness of 
plaquettes. The processes of the algorithm and the usage 
of the arrays are summarized in Figure |^. Only the bond 
status array is used in all steps. It is clear that if the 
memory is the bottle-neck of calculations, the bond-site 
array and the bond-plaquette array can occupy the same 
memory, so can the site label array and the plaquette ar- 
ray. The reasons are (i) they are not used simultaneously 
and (ii) they are used sequentially. The bond-plaquette 
array is used in Step 1-3 while the bond-site array is used 
in Step 4-5. The plaquette label array is used in Step 1-4 
while the site label array is used in Step 5. 

Figure |^ shows one of the simplest d — 2 nonplanar 
graphs: a square lattice with both nearest neighboring 
(nn) bonds and next nearest neighboring (nnn) bonds. 
This type of nonplanar graphs is important because it is 
often used to mimic the effects of higher dimensionality 
fl5[ . Here a plaquette is a smallest triangular area. While 
a nn bond still has two side plaquettes, a nnn bond has 
four which are divided into two pairs by another nnn 
bond. If any pair of side plaquettes of a nnn bond have 
the same label, the nnn bond can be closed in Step 2. 
The number of bonds (plaquettes) in this case is twice 
(four times) as large as that in the corresponding planar 
square lattice with nn bonds only. 

The efficiency of our algorithm is demonstrated in Fig- 
ure |] for square lattices with nn bonds only, where the 
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central-processing-unit (CPU) time needed for the simu- 
lation is plotted versus the number of sites. We consider 
bond percolation at p = p^ = 0.5 || and site percola- 
tion [jn§ at p = pi s) = 0.592745(2) f2§. The results are 
obtained by averaging over 1000 configurations |2l[] for 
linear size L = 100, 200, 300, 500, 1000, 1500, 2000, 2500, 
respectively. Results for systems of larger size and more 
configurations will be reported soon. The calculations 
are carried out on a Pentium 11/233 processor with Red 
Hat Linux release 4.2 operating system and GNU project 
F77 Compiler (v0.5.18). As shown in Figure^, the dotted 
lines are our fits to the data for three algorithms. Their 
slopes are 0.51, 1.21, and 4.62 for the modified Hoshen- 
Kopclman algorithm, our new algorithm, and Tarjan's 
depth-first-search algorithm, respectively. All the three 
algorithms run in time proportional to the system size. 
The total time needed to find the backbone is about twice 
as many as that to compute the connected cluster, since 
we use the modified Hoshen-Kopelman algorithm twice in 
the former and once in the latter. The present algorithm 
finds the current-carrying backbone almost four times as 
fast as Tarjan's depth-first-search algorithm. Note that 
in our algorithm finding the backbone is not subject to 
finding the spanning cluster first. An advantage of the 
new algorithm is that it is easy to program and maintain 
because it involves only three local properties of dan- 
gling parts and uses the well-known modified Hoshen- 
Kopelman algorithm. The codes of the algorithm are 
available from us. 

There are other algorithms for backbone identification 
|,|§Mra. A brief review of them was recently given 
in Ref. ]12|. Here we make a comparison between ours 
and these known algorithm. The traditionally used algo- 
rithm by physicists is the burning algorithm ||k|, which 
is at least for large N much slower than Tarjan's algo- 
rithm The matching algorithm with complexity 
slight larger than AjO(A^ 107 ) for d = 2] was used re- 
cently in literature j8|. For strictly planar graphs, the 
hull-generating algorithm is even faster [p|fTc|] . This al- 
gorithm has the same asymptotic complexity as Tarjan's 
algorithm, but it is roughly twice as fast and uses about 
half of the memory, since it needs one data structure less 
and needs only one pass through all sites, instead of two 
passes in Tarjan's algorithm J12[ . For planar graphs, our 
algorithm is almost four times as fast as Tarjan's algo- 
rithm and uses the memory of the same size as the mod- 
ified Hoshen-Kopelman algorithm. For site percolation 
p9[ , a modified version of our algorithm uses the mem- 
ory of even smaller size where the bond-site array and the 
bond-plaquette array are discarded |l6| . Since the hull- 
generating algorithm, Tarjan's, and ours have the same 
asymptotic complexity, their difference in speed is deter- 
mined by the complexity of inner operations. Note that 
the identification of the backbone or the spanning clus- 
ter involves the global properties of sites. Our algorithm 
has an advantage such that it is able to identify dangling 
parts using their local properties complemented by the 



most efficient algorithm computing the connectedness of 
sites, which is currently the modified Hoshen-Kopelman 
algorithm. The modified Hoshen-Kopelman algorithm 
was originally used to identify the spanning cluster [|J. 
Because of its simplicity, the geometrical properties of 
the spanning cluster have been numerically studied on 
very large systems containing as much as 10 11 sites [[|. 
Therefore, the inner operations of our algorithm are sim- 
pler than those of Tarjan's and hull-generating. It should 
be made clear that the algorithms of Tarjan, matching, 
and burning can be used for arbitrary graphs; the hull- 
generating algorithm is valid for strictly planar graphs 
only; ours applies to general d — 2 graphs with open lit- 
eral boundary conditions, and thus has a wider range of 
application than the hull-generating algorithm. 

The exact finding of the spanning cluster and the back- 
bone enables us to calculate some scaling exponents for 
percolation. The order parameter of percolation is P^, 
the ratio of the sites in the spanning cluster to all sites in 
the system. Correspondingly, we define Q^, the ratio of 
the sites in the backbone to all sites in the system. These 
quantities scale with linear size L as 

Poc{L) ~ L^ 13 ^, Q 00 (L)~L-«' V , fori ^oo (1) 

where /3, q, and v are the critical exponents of Poo, Qoo> 
and the correlation length, respectively. 

We calculate these scaling exponents in our Monte 
Carlo runs. Note that the following numerics are 
listed for demonstration because we have not calcu- 
lated them on a world-record-breaking number of world- 
record-breaking scale lattices. The largest size consid- 
ered here is L — 2500 with 1000 configurations. Re- 
sults for i max = 4096 with 198470 configurations were 
reported in Ref. Q . Calculations for systems of larger 
size and more configurations are in progress. The data 
are plotted in Figure with very small standard errors 
of the mean. The lines shown in the log-log plot are the 
results of the weighted-least-squares fits, and yield the 
value (3/v = 0.1068 ± 0.0013 (close to the exact result 
(3/v = 5/48 g,f§), and q/u = 0.3647±0.0039. The calcu- 
lated exponents satisfy the Chayes' exponent inequalities 

B 

2/3<q<t, (2) 

where t is the exponent of conductivity (the best esti- 
mate for two-dimensional systems known to us is tjv = 
0.9745 ± 0.0015 @). The critical behavior of the back- 
bone is consistently closer to that of the conductivity 
than that of the spanning cluster. Since (Joo/Poo vanishes 
as p — > p Cl the unknowns in the calculation of conduc- 
tivity on the backbone is considerably less than those on 
the spanning cluster. As a result, the calculation of elec- 
tronic conductivity on the backbone has considerably less 
statistical fluctuation and much reduced critical slowing 
down than on the spanning cluster p4[ . 

Summarizing, we present a fast algorithm for identi- 
fying percolation backbones in general two-dimensional 
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graphs with open boundary conditions. Complemented 
by the modified Hoshen-Kopelman algorithm, our algo- 
rithm identifies dangling parts using their local proper- 
ties. For planar graphs, our algorithm is by far the fastest 
algorithm for backbone identification: it finds the back- 
bone almost four times as fast as Tarjan's depth-first- 
scarch algorithm, and uses the memory of the same size 
as the modified Hoshen-Kopelman algorithm. 
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FIG. 1. An illustration of percolation on a square lattice 
of linear size L = 10: site (filled circle), open bond (solid 
line), closed bond (dotted line), plaquette (smallest square 
area), a is a dangling-loop-connecting site, b is an open bond 
connecting a dangling end, and c are open bonds belonging 
to a dangling arc. Dashed lines are some paths of connected 
plaquettes enclosing some dangling parts. 
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FIG. 4. CPU time of three algorithms for bond percola- 
tion at p = pi b ^ = 0.5. The dotted lines are our fits to the 
data. Their slopes are 0.51, 1.21, and 4.62 for the modified 
Hoshen-Kopelman algorithm, the new algorithm, and Tar- 
jan's depth-first-search algorithm, respectively. 



FIG. 2. Flowchart of our algorithm. A line between an 
array (parallelogram) and a process (rectangle) indicates that 
the array is used in the process. If the line is dashed, the array 
changes in the process. 
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FIG. 3. A square lattice of linear size L = 2 with crossing 
bonds (oc and bd). Filled circles a,b,c,d denote sites. The 
smallest trianglular areas 1, 2, 3, 4 denote plaquettes. Bond ac 
has two pairs of side plaquettes [(1,2) and (3,4)]. Bond bd 
has two pairs of side plaquettes [(1, 4) and (2, 3)]. If any pair 
of side plaquettes of a crossing bond have the same label, the 
crossing bond can be closed. 
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FIG. 5. Log-Log plot of Qoo and Poo for site percola- 
tion at p = p' s ^ = 0.5927. The dotted lines are the re- 
sults of our weighted-least-squares fits against the functions 
Qoc{L) = aL~ q l u and Poo = &L~ /3 /" , giving values of 
q/v = 0.3647 ± 0.0039 and (3/v = 0.1068 ± 0.0013. 
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